Hands-On Introduction to Recommender Systems with R

Theme: AI and Society

KT Wong

Faculty of Social Sciences, HKU

2025-07-31

Main Theme: AI and Society

Hands-On Introduction to Recommender Systems with R
Audience: High school students new to R
Duration: 2 hours
Environment: R and RStudio

Why Recommender Systems Matter

  • Why Interesting?
    • Power apps you use daily
      • Netflix suggests shows
      • TikTok picks videos
      • Amazon recommends products
    • Personalize experiences, making life easier and fun
      • e.g. Spotify suggests songs you love based on your listening habits

Why Recommender System Matter

  • Why Important?
    • Drive business and innovation
      • Companies use recommendations to improve user satisfaction
    • Shape society
      • Influence what you watch, buy, or learn
      • e.g. AI study apps suggest further references to help you learn better

Why Recommender System Matter

  • Societal Impacts:
    • Benefits
      • Discover new content, save time
        • e.g. finding a great travel app
    • Challenges
      • Privacy concerns
        • e.g. tracking your likes
      • filter bubbles
        • e.g. biased news feeds
  • Today: Build our own (toy) recommender system in R to see how it works!

Overview

  • this workshop introduces recommender systems using R in RStudio

  • Task: Analyze user ratings for AI-related products (e.g. apps, tools)

    • Learn basic R commands
    • Build a recommender system — explore the underlying mechanism
  • Roadmap

    • Background on Recommender Systems
    • Hands-on tasks
    • Compare collaborative and content-based methods
    • Discuss AI’s societal role
  • Learning Goals:

    • Understand recommender systems in AI
    • Use R for building and evaluating recommendations
    • Connect findings to societal impacts

Background: Recommender Systems and AI in Society

  • What are Recommender Systems?

    • AI systems suggest items based on user preferences
      • e.g., Netflix movies, Amazon products, Spotify songs
    • Types:
      • Collaborative Filtering: Uses user behavior (e.g. ratings)
      • Content-Based: Uses item features (e.g. app categories)
      • Factor Models: Finds hidden patterns in ratings
      • LDA: Uses topic modeling to group items by themes
    • Used in apps, shopping, social media
  • How They Work:

    • Collaborative: Finds similar users/items (e.g. “People like you liked this app”)
    • Content-Based: Matches item features to user interests
    • Factor Models: Breaks ratings into latent factors (e.g. user preferences for education apps)
    • LDA: Identifies topics in items (e.g. StudyApp is “educational”)
  • e.g. If you buy a product (book), get similar product (book) suggestions

  • Focus on the working of collaborative filtering, introduce content-based briefly when time allows

Hands-on Workshop

Step 1: Introduction and Setup

Objective: Set up RStudio

  • Task 1.1: Open RStudio
    • Open RStudio, create new R script: File > New File > R Script
  • Task 1.2: Install Packages
Code
install.packages(c("tidyverse", "recommenderlab", "proxy", "topicmodels", "text2vec"))

Step 2: Loading Tools and Data

Objective: Load packages and dataset

  • Dataset: 30 users rating 10 AI products (1–5 scale)
Code
library(tidyverse)
library(recommenderlab)
library(proxy)


ratings <- matrix(sample(c(NA, 1:5), 300, replace = TRUE, prob = c(0.7, 0.1, 0.1, 0.1, 0.05, 0.05)), 
                 nrow = 30, ncol = 10)

colnames(ratings) <- c("StudyApp", "ChatBot", "ArtGen", "CodeTool", "VoiceAI", 
                       "HealthAI", "GameAI", "MusicAI", "CarAI", "TutorAI")

rownames(ratings) <- paste0("User", 1:30)

ratings <- as(ratings, "realRatingMatrix")
  • Task 2.1: Run code (Ctrl+Enter)

  • Task 2.2: take a look at the ratings matrix

    • Run View(as(ratings, "matrix"))

Step 3: Exploring the Dataset

Objective: Understand dataset structure

  • the dateset is a matrix of user ratings for AI products

  • Task 2.3: Check dimensions and first few ratings

    • Run dim(ratings)
    • Run colnames(ratings)
Code
dim(ratings)
colnames(ratings)
  • Task 3.1: look at the subset of the matrix
    • Run ratings@data[1:3, 1:3]
  • Task 3.2: Which user rated most?
    • Run rowSums(ratings@data, na.rm = TRUE)[1:5]

Step 4: Building a Collaborative Filtering Model

Objective: Create user-based collaborative filtering model

  • Recommends products based on similar users’ ratings
Code
recommender_ubcf <- Recommender(ratings, method = "UBCF", 
                               param = list(normalize = "center", 
                                            method = "cosine"))
  • Task 4.1: Run code

  • Task 4.2: Check model details

    • Run getModel(recommender)$description

Step 5: Generating Recommendations

Objective: Predict recommendations for users

  • Suggest top AI products for users
Code
predictions_ubcf <- predict(recommender_ubcf, ratings[1:5], n = 2)

pred_list_ubcf <- as(predictions_ubcf, "list")
  • Task 5.1: see User1’s recommendations
    • Run pred_list[[1]]
  • Task 5.2: Check recommendations for User2
    • Run pred_list[[2]]

Step 6: Content-Based Approach

Objective: Compare with content-based filtering

  • Use product features
    • e.g., categories: Education, Health
Code
# Product features (simplified: 1 = has feature, 0 = doesn’t)

features <- matrix(c(
  1, 0, 0, 1, 0, 0, 0, 0, 0, 1,  # StudyApp: Education, Coding
  0, 1, 0, 0, 1, 0, 0, 0, 0, 0,  # ChatBot: Interaction
  0, 0, 1, 0, 0, 0, 1, 1, 0, 0,  # ArtGen: Creativity, Gaming
  1, 0, 0, 1, 0, 0, 0, 0, 0, 0,  # CodeTool: Education, Coding
  0, 1, 0, 0, 1, 0, 0, 0, 0, 0,  # VoiceAI: Interaction
  0, 0, 0, 0, 0, 1, 0, 0, 0, 0,  # HealthAI: Health
  0, 0, 1, 0, 0, 0, 1, 0, 0, 0,  # GameAI: Creativity, Gaming
  0, 0, 1, 0, 0, 0, 0, 1, 0, 0,  # MusicAI: Creativity
  0, 0, 0, 0, 0, 0, 0, 0, 1, 0,  # CarAI: Driving
  1, 0, 0, 0, 0, 0, 0, 0, 0, 1   # TutorAI: Education
  ), 
  nrow = 10, byrow = TRUE)

colnames(features) <- c("Education", "Interaction", "Creativity", "Coding", 
                        "Voice", "Health", "Gaming", "Music", "Driving", "Tutoring")

rownames(features) <- colnames(ratings)

# Cosine similarity for content-based recommendations
sim_matrix <- as.matrix(simil(features, method = "cosine"))

content_recs <- apply(sim_matrix, 1, function(x) names(sort(x, decreasing = TRUE)[2]))
  • Task 6.1: Check “StudyApp”
    • Run content_recs["StudyApp"]
  • Task 6.2: Check “ChatBot”
    • Run content_recs["ChatBot"]

Step 7: Evaluating Recommendations

Objective: Assess recommendation quality

  • Check if predictions match user interests
Code
# Split data for evaluation
train <- ratings[1:20]
test <- ratings[21:30]

recommender <- Recommender(train, method = "UBCF", 
                          param = list(normalize = "center", 
                                       method = "cosine"))

pred_test <- predict(recommender, test, n = 2)
  • Task 7.1: Check User21’s predictions
    • Run as(pred_test, "list")[[1]]
  • Task 7.2: Compare to actual ratings
    • Compare to as(test, "matrix")[1, ]

Step 8: Visualizing and Discussing Results

Objective: Visualize recommendations, compare methods, discuss societal impact

  • Plot top recommendations; compare collaborative vs. content-based

  • Collaborative Filtering Plot:

Code
top_recs <- as(predictions, "list")[[1]]

rec_data <- tibble(Product = top_recs, Score = 1:length(top_recs))

ggplot(rec_data, aes(x = reorder(Product, Score), y = Score)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(title = "Top Recommendations for User1 (Collaborative)", x = "Product", y = "Rank") +
  theme_minimal()
  • Comparison Plot
Code
comparison <- tibble(
  Product = c(colnames(ratings)[1:2], content_recs[1:2]),
  Method = rep(c("Collaborative", "Content-Based"), each = 2),
  Rank = rep(1:2, 2)
  )

ggplot(comparison, 
       aes(x = Product, y = Rank, fill = Method)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Recommendation Comparison", 
       x = "Product", 
       y = "Rank") +
  scale_fill_manual(values = c("#1f77b4", "#ff7f0e")) +
  theme_minimal()
  • Task 8.1: Run collaborative plot

    • Note User1’s top product
  • Task 8.2: Run comparison plot

  • Compare methods for StudyApp

  • Task 8.3: Check differences

    • Run tibble(Collaborative = pred_list[[1]], ContentBased = content_recs[pred_list[[1]]])

Step 9: Factor Models for Recommendations

Objective: Use matrix factorization for recommendations

  • Breaks ratings into latent factors (e.g., user preferences for education apps)
Code
recommender_svd <- Recommender(ratings, method = "SVD", 
                              param = list(k = 5))  # 5 latent factors

predictions_svd <- predict(recommender_svd, ratings[1:5], n = 2)

pred_list_svd <- as(predictions_svd, "list")
  • Task 9.1: Check User1’s SVD recommendations
    • Run pred_list_svd[[1]]
  • Task 9.2: Compare User1’s UBCF vs. SVD
    • Run c(UBCF = pred_list_ubcf[[1]], SVD = pred_list_svd[[1]])

Step 10: LDA for Recommendations

Objective: Use Latent Dirichlet Allocation for recommendations

  • Models items as topics (e.g., “educational” or “creative”) based on ratings
Code
# Convert ratings to binary (rated = 1, not rated = 0) for LDA
ratings_binary <- ratings
ratings_binary@data[!is.na(ratings_binary@data)] <- 1
ratings_binary@data[is.na(ratings_binary@data)] <- 0

# Run LDA with 5 topics
lda_model <- LDA(ratings_binary@data, k = 5, control = list(seed = 123))
item_topics <- posterior(lda_model)$topics

# Recommend items with similar topics for User1
user1_topics <- item_topics[1, ]
lda_recs <- colnames(ratings)[order(item_topics[1, ], decreasing = TRUE)[1:2]]
  • Task 10.1: Run lda_recs
  • Task 10.2: Compare User1’s UBCF vs. LDA
    • Run c(UBCF = pred_list_ubcf[[1]], LDA = lda_recs)

Step 11: Visualizing

Objective: Visualize recommendations, compare methods, discuss societal impact

  • Plot top recommendations; compare UBCF, SVD, and LDA

  • Collaborative Filtering Plot:

Code
top_recs_ubcf <- pred_list_ubcf[[1]]

rec_data <- tibble(Product = top_recs_ubcf, Score = 1:length(top_recs_ubcf))

ggplot(rec_data, 
       aes(x = reorder(Product, Score), y = Score)) +
  geom_bar(stat = "identity", fill = "blue") +
  labs(title = "Top Recommendations for User1 (UBCF)", 
       x = "Product", y = "Rank") +
  theme_minimal()
  • Comparison Plot:
Code
comparison <- tibble(
  Product = c(pred_list_ubcf[[1]],  pred_list_svd[[1]], lda_recs),
  Method = rep(c("UBCF", "SVD", "LDA"), each = 2),
  Rank = rep(1:2, 3)
)

ggplot(comparison,
       aes(x = Product, y = Rank, fill = Method)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Recommendation Comparison for User1", x = "Product", y = "Rank") +
  scale_fill_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
  theme_minimal()
  • Task 8.1: Run UBCF plot. Note User1’s top product
    • Run top_recs_ubcf
  • Task 8.2: Run comparison plot. Compare methods for StudyApp
    • Run comparison
  • Task 8.3: Check differences
    • Run tibble(UBCF = pred_list_ubcf[[1]], SVD = pred_list_svd[[1]], LDA = lda_recs)
      • StudyApp: UBCF (user similarity), SVD (latent factors), LDA (topic-based)
      • ChatBot: May differ due to thematic grouping in LDA

Discussion

  • Why do methods differ? (UBCF: user similarity, SVD: hidden patterns, LDA: themes)

  • Societal impacts: Personalization vs. privacy, filter bubbles

  • How can recommendations improve AI use in society?